Acceleration of First and Higher Order Recurrences on Processors with Instruction Level Parallelism
نویسندگان
چکیده
This report describes parallelization techniques for accelerating a broad class of recurrences on processors with instruction level parallelism. We introduce a new technique, called blocked back-substitution, which has lower operation count and higher performance than previous methods. The blocked back-substitution technique requires unrolling and non-symmetric optimization of innermost loop iterations. We present metrics to characterize the performance of software-pipelined loops and compare these metrics for a range of height reduction techniques and processor architectures.
منابع مشابه
Complexity Effective ASIP Architectures for Network Processing and Multimedia Acceleration
xiii 1 Processor Design 1 1.1 Technology Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Application Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Choice of Implementation Platforms . . . . . . . . . . . . . . . . . . . . . . 7 1.4 ASIP Design Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 Complexity Effective Desi...
متن کاملA predecoding technique for ILP exploitation in Java processors
Java processors have been introduced to offer hardware acceleration for Java applications. They execute Java bytecodes directly in hardware. However, the stack nature of the Java virtual machine instruction set imposes a limitation on the achievable execution performance. In order to exploit instruction level parallelism and allow out of order execution, we must remove the stack completely. Thi...
متن کاملA Ubiquitous Processor Built-in a Waved Multifunctional Unit
In developing cutting edge VLSI processors, parallelism is one of the most important global standard strategies to achieve power conscious high performance. These features are more critical for ubiquitous systems with great demands for multimedia mobile processing. Then, one of most important issues for ubiquitous systems is instruction scheduling, because floating point units indispensable for...
متن کاملEvaluating Compiler Support for Complexity Effective Network Processing
Statically scheduled processors are known to enable low complexity hardware implementations that lead to reduced design and verification time. However, statically scheduled processors are critically dependent on the compiler to exploit instruction level parallelism and deliver higher performance. In order to ascertain the suitability of statically scheduled processors for network processing (wh...
متن کاملSuperscalar instruction issue
learly, instruction issue and execution are closely related: The more parallel the instruction execution, the higher the requirements for the parallelism of instruction issue. Thus, we see the continuous and harmonized increase of parallelism in instruction issue and execution. This article focuses on superscalar instruction issue, tracing the way parallel instruction execution and issue have i...
متن کامل